Picture for Peter Hase

Peter Hase

Reasoning Models Don't Always Say What They Think

Add code
May 08, 2025
Viaarxiv icon

Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation

Add code
May 01, 2025
Viaarxiv icon

Teaching Models to Balance Resisting and Accepting Persuasion

Add code
Oct 18, 2024
Figure 1 for Teaching Models to Balance Resisting and Accepting Persuasion
Figure 2 for Teaching Models to Balance Resisting and Accepting Persuasion
Figure 3 for Teaching Models to Balance Resisting and Accepting Persuasion
Figure 4 for Teaching Models to Balance Resisting and Accepting Persuasion
Viaarxiv icon

System-1.x: Learning to Balance Fast and Slow Planning with Language Models

Add code
Jul 19, 2024
Viaarxiv icon

Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs?

Add code
Jun 27, 2024
Viaarxiv icon

Are language models rational? The case of coherence norms and belief revision

Add code
Jun 05, 2024
Viaarxiv icon

LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models

Add code
May 31, 2024
Viaarxiv icon

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

Add code
Apr 15, 2024
Figure 1 for Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Figure 2 for Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Figure 3 for Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Figure 4 for Foundational Challenges in Assuring Alignment and Safety of Large Language Models
Viaarxiv icon

Rethinking Machine Unlearning for Large Language Models

Add code
Feb 15, 2024
Viaarxiv icon

The Unreasonable Effectiveness of Easy Training Data for Hard Tasks

Add code
Jan 12, 2024
Viaarxiv icon